Goto

Collaborating Authors

 stochastic neural network



Inherent Weight Normalization in Stochastic Neural Networks

Neural Information Processing Systems

Multiplicative stochasticity such as Dropout improves the robustness and gener-alizability deep neural networks. Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons provide a sufficient substrate for deep learning machines. We call such models Neural Sampling Machines (NSM). We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion. The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the distribution of inputs. The always-on stochasticity of the NSM confers the following advantages: the network is identical in the inference and learning phases, making the NSM a suitable substrate for continual learning, it can exploit stochasticity inherent to a physical substrate such as analog non-volatile memories for in memory computing, and it is suitable for Monte Carlo sampling, while requiring almost exclusively addition and comparison operations. We demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and event-based classification benchmarks (N-MNIST and DVS Gestures). Our results show that NSMs perform comparably or better than conventional artificial neural networks with the same architecture.


Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network

Neural Information Processing Systems

Sufficient dimension reduction is a powerful tool to extract core information hidden in the high-dimensional data and has potentially many important applications in machine learning tasks. However, the existing nonlinear sufficient dimension reduction methods often lack the scalability necessary for dealing with large-scale data. We propose a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for large-scale data. The proposed stochastic neural network is trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in the paper as well. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.


A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks

Neural Information Processing Systems

We present a probabilistic framework for nonlinearities, based on doubly truncated Gaussian distributions. By setting the truncation points appropriately, we are able to generate various types of nonlinearities within a unified framework, including sigmoid, tanh and ReLU, the most commonly used nonlinearities in neural networks. The framework readily integrates into existing stochastic neural networks (with hidden units characterized as random variables), allowing one for the first time to learn the nonlinearities alongside model weights in these networks. Extensive experiments demonstrate the performance improvements brought about by the proposed framework when integrated with the restricted Boltzmann machine (RBM), temporal RBM and the truncated Gaussian graphical model (TGGM).


The camera-ready version of the manuscript will be modified with all the changes and new results we describe

Neural Information Processing Systems

Thank you all reviewers for your in-depth comments. We fixed the typos and abbreviations indicated by reviewer 1 in the text. Previously unexplained terms ( e.g., the offset parameter to the noise) We improved the SI by elaborating on Figure 3, fixing missing sections (4.3) and detailing the DVS Abbreviation SNN (Stochastic Neural Network) was changed to StNN. Klambauer et al. in 2017 indeed introduced Our work is different in terms of objective and results. Regarding Eq. (7), our calculations confirm the derivative w.r.t.



Reviews: Inherent Weight Normalization in Stochastic Neural Networks

Neural Information Processing Systems

Related work is well cited across these fields, and the approach is unique in this reviewer's knowledge. Quality The quality of this work is adequate, though there are a couple of simple errors in the text (misspelling in Figure 1, missing sections in the supplementary material, lack of explanation of some abbreviations such as W_3 and S2M). Overall, the text and derivation is done with high quality, and the tricks used in the derivation are called out to adequately describe the steps to the reader. The conclusions stand on their own, and quality of insight is needed to bridge stochastic neural networks, multiplicative weights, and weight normalization. The work could use more difficult datasets, though, to emphasize these results.


Reviews: Inherent Weight Normalization in Stochastic Neural Networks

Neural Information Processing Systems

The author rebuttal went a long way towards satisfying the reviewers and all of them have recommended acceptance after discussions. The authors should go through the suggestions given by the reviewers (esp.


Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network

Neural Information Processing Systems

Sufficient dimension reduction is a powerful tool to extract core information hidden in the high-dimensional data and has potentially many important applications in machine learning tasks. However, the existing nonlinear sufficient dimension reduction methods often lack the scalability necessary for dealing with large-scale data. We propose a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for large-scale data. The proposed stochastic neural network is trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in the paper as well. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.


Inherent Weight Normalization in Stochastic Neural Networks

Neural Information Processing Systems

Multiplicative stochasticity such as Dropout improves the robustness and gener- alizability deep neural networks. Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons provide a suf- ficient substrate for deep learning machines. We call such models Neural Sampling Machines (NSM). We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion. The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the distribution of inputs.